Levene’s test procedure is to perform a one-way ANOVA on the constructed response variable
\[ z_{ij} = \left| y_{ij} - \bar{y}_{i\cdot} \right| \]
If the F statistic is significant, the null hypothesis of having homogeneity of variance is rejected.
Levene’s test is found in many statistical software applications. Many other tests have been created and no one test is superior to all others under all conditions. The choice of which test to use is therefore a matter of which one works sufficiently well for the range of circumstances to which it will be applied. Milliken and Johnson (2009) review a number of tests and make recommendations based on simulation studies taken from the literature. They note that Levene’s test is better than Hartley’s or Bartlett’s test when the data are not normally distributed. However in the second edition of their first volume, they add consideration of two more tests which perform better than Levene’s test under certain conditions. O’Brien’s test works better when the underlying distributions are skewed, while the Brown-Forsyth test is better when the tails of the distribution are heavier.
The Hartley test is linked to the rule of thumb about the ratio of maximum to minimum group variances being less than 4. In fact the test statistic tabulated for Hartley’s test is not always around or above 4, especially when the within group sizes are small. This test is not commonly found in software.
Bartlett’s test is found in many statistical software applications, but requires the assumption of normal populations which is often not certain.
OBrien’s test follows the same procedure as Levene’s test in that a substitute value is found for all \(y_{ij}\) and an analysis of variance is performed on them. The formula for the transformation is much more complicated than Levene’s test and includes selection of a weight parameter.
The Brown-Forsyth test is a modification of Levene’s test, and is described below.
If the data look like they are from a skewed distribution, Brown and Forsyth (1974) suggest that the ith group mean \(\bar{y}_{i\cdot}\) should be replaced by the ith group median \(\tilde{y}_{i\cdot}\) instead. This procedure therefore uses the response variable
\[z_{ij} = \left| y_{ij} - \tilde{y}_{i\cdot} \right| \]
The most well-known option for dealing with heterogeneous variance in a completely randomized design (CRD) is to transform the response variable. There are situations where this does not yield a suitable response variable, but the F test procedure is quite robust to differences in the within-group variances, especially if the groups sizes are near-equal, or the larger within-group variances are those groups that have larger size.
Milliken and Johnson (2009 p38) suggested that the the simple analysis for a CRD is fine unless the Levene Test is rejected at 1%. If it is rejected, then:
Note: Milliken and Johnson (1992) suggested the Box (1954) procedure, but they drop it in the secodn edition (Milliken and Johnson 2009).
This method uses the standard ANOVA approach but changes the df for the critical F statistic. Given we have t treatment groups and each has size n and an observed variance \(\sigma_i^2\), we calculate \[\bar{\sigma}^2 = \left( \sum_i {\sigma_i^2} \right)/t \]
\[c^2 = \left( \sum_i {(\sigma_i^2 -\bar{\sigma}^2)^2} \right)/t( \bar{\sigma}^2)^2 \]
and then find numerator df as
\[\nu_1 = \frac{t-1}{1+c^2\left(\frac{t-2}{t-1}\right)} \]
and denominator df as
\[\nu_2 = \frac{t(n-1)}{1+c^2} \]
In a perfect world, where the \(\sigma_i^2\) values are all equal, the above formulae reduces to \(c^2=0\), \(\nu_1=t-1\), and \(\nu_2=t(n-1)\) as is the case for the standard ANOVA. The most extreme case would have \(c^2=t-1\) which would lead to \(\nu_1=1\) and \(\nu_2 = n-1\). An ultra conservative test would use the critical value for \(F_{\alpha,1,n-1}\)
Define \(W_i = n_i/\hat{\sigma}_i^2\) and \[\bar{Y}^* = \frac{\sum_i {W_i \bar{Y}_{i\cdot}}} {\sum_i {W_i}} \]
Let \[\Lambda = \sum_i { \frac{ (1-W_i/W_\cdot)^2}{n_i -1}} \]
where \(W_\cdot=\sum_i{W_i}\)
The test statistic is then
\[F = \frac{\sum_i{ W_i \frac{(\bar{Y}_{i\cdot} - \bar{Y}^*)^2}{t-1}}}{ 1+2(t-2)\Lambda/(t^2-1)} \]
and is compared to the critical value from the F distribution with \(\nu_1=t-1\) and \(\nu_2 = (t^2-1)/3\Lambda\).
Milliken and Johnson (2009) provide a second approach for the adjusted test procedure but it is more difficult than the Welch procedure given here.
Box, George E. P. 1954. “Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems.” Annals of Mathematical Statistics 25:290–302.
Brown, W. A., and A. B. Forsyth. 1974. “Robust Tests for Equality of Variances.” Journal of the American Statistical Association.
Milliken, George A., and Dallas E. Johnson. 1992. Analysis of Messy Data: Volume 1 Designed Experiments. Boca Raton, Florida: Chapman; Hall/CRC Press.
———. 2009. Analysis of Messy Data: Volume 1 Designed Experiments. Second. Boca Raton, Florida: Chapman; Hall/CRC Press.
Welch, B. L. 1951. “On the Comparison of Several Mean Values: An Alternative Approach.” Biometrika 38:330–36.